Each of these pages provides an analysis run through for a different type of design. Each document is structured in the same way:

  • First the data and research context is introduced. For the purpose of these tutorials, we will only use examples where the data can be shared - either because it is from an open access publication, or because it is unpublished or simulated.
  • Second, we go through any tidying of the data that is required, before creating some brief descriptives and visualizations of the raw data.
  • Then, we conduct an analysis. Where possible, we translate the research questions into formal equations prior to fitting the models in lme4. Model comparisons are conducted, along with checks of distributional assumptions on our model residuals.
  • Finally, we visualize and interpret our analysis.

Please note that there will be only minimal explanation of the steps undertaken here, as these pages are intended as example analyses rather than additional labs readings. Please also be aware that there are many decisions to be made throughout conducting analyses, and it may be the case that you disagree with some of the choices we make here. As always with these things, it is how we justify our choices that is important. We warmly welcome any feedback and suggestions to improve these examples: please email ug.ppls.stats@ed.ac.uk.

Overview

The idea behind “repeated measures” is that the same variable is measured on the same set of subjects over two or more time periods or under different conditions.

The data used for this worked example are simulated to represent data from 50 participants, each measured at 3 different time-points on an outcome variable of interest. This is a fairly simple design, leading from a question such as “how does [dependent variable] change over time?” You might easily think of the 3 time points as 3 different experimental conditions instead (condition1, condition2, condition3) and ask “how does [dv] change over depending on [independent variable]?”

This is a very simple way to simulate repeated measures data structure (with long data). There are a good number of other approaches, but this will do for now as you may well be familiar with all the functions involved:

set.seed(347)
library(tidyverse)
simRPT <- tibble(
  pid = factor(rep(paste("ID", 1:50, sep=""),each=3)),
  ppt_int = rep(rnorm(50,0,10),each=3), # add some participantz random-ness
  dv = rnorm(150,c(40,50,70),sd=10) + ppt_int,
  iv = factor(rep(c("T1", "T2", "T3"), each=1, 50))
)

If you are unclear about any section of the code above, why not try running small bits of it in your console to see what it is doing?
For instance, try running:

  • paste("ID", 1:50, sep="")
  • rep(paste("ID", 1:50, sep=""),each=3)
  • factor(rep(paste("ID", 1:50, sep=""),each=3))

Data Wrangling

Because we simulated our data, it is already nice and tidy. Each observation is a row, and we have variable indicating participant id (pid).

head(simRPT)
## # A tibble: 6 × 4
##   pid   ppt_int    dv iv   
##   <fct>   <dbl> <dbl> <fct>
## 1 ID1     -1.92  41.6 T1   
## 2 ID1     -1.92  40.2 T2   
## 3 ID1     -1.92  63.6 T3   
## 4 ID2    -11.1   22.9 T1   
## 5 ID2    -11.1   40.5 T2   
## 6 ID2    -11.1   60.6 T3

Descriptives

Let’ see our summaries per time-point:

sumRPT <- 
  simRPT %>%
  group_by(iv) %>%
  summarise(
    n = n_distinct(pid),
    mean.dv = round(mean(dv, na.rm=T),2),
    sd.dv = round(sd(dv, na.rm=T),2)
    )
sumRPT
## # A tibble: 3 × 4
##   iv        n mean.dv sd.dv
##   <fct> <int>   <dbl> <dbl>
## 1 T1       50    38.8  13.1
## 2 T2       50    50.8  11.2
## 3 T3       50    67.8  13.5

We can make this a little prettier:

library(knitr)
library(kableExtra)
kable(sumRPT) %>%
  kable_styling("striped")
iv n mean.dv sd.dv
T1 50 38.84 13.10
T2 50 50.82 11.16
T3 50 67.81 13.48

Well…we knew what the answer was going to be, but there we have it, our scores improve across the three administrations of our test.

Visualizations

We can construct some simple plots showing distribution of the outcome variable at each level of the independent variable (iv):

simRPT %>% 
  ggplot(aes(x = iv, y = dv)) + 
  geom_violin() + 
  geom_jitter(alpha=.5,width=.1,height=0) + 
  labs(x="Time", y = "Test Score", 
       title="Scores across trials", 
       subtitle = "Violin Plots with (jittered) Observations")+
  theme_minimal()

So what does this show? Essentially we are plotting all responses at each point in time. The points are jittered so that they are not all overlaid on one another. The areas marked at each time point are mirrored density plots (i.e. they show the distribution of the scores at each point in time).

If you want to get an intuitive sense of these plotted areas, look at them against the mean’s and sd’s per time point calculated above.

simRPT %>% 
  ggplot(aes(as.numeric(iv), dv)) +  
  stat_summary(fun.data = mean_cl_boot, geom="ribbon", alpha=.3) + 
  stat_summary(fun.y = mean, geom="line") + 
  labs(x="Time", y = "Test Score", 
       title="Scores across trials", 
       subtitle = "Mean and Boostrapped 95% Confidence Intervals")

We can also show each participants’ trajectory over time, by using the group aesthetic mapping.

simRPT %>%
  ggplot(aes(x = iv, y = dv)) +
  geom_point(size=3, alpha=.4)+
  geom_line(aes(group=pid), alpha = .2) +
  theme_minimal()

Analysis

Equations

We’re going to fit this model, and examine the change in dv associated with moving from time-point 1 to each subsequent time-point.
Recall that because iv is categorical with 3 levels, we’re going to be estimating 2 (\(3-1\)) coefficients. \[ \begin{aligned} \operatorname{dv}_{i[j]} &= \beta_{0i} + \beta_1(\operatorname{iv}_{\operatorname{T2}_j}) + \beta_2(\operatorname{iv}_{\operatorname{T3}_j}) + \varepsilon_{i[j]} \\ \beta_{0i} &= \gamma_{00} + \zeta_{0i} \\ & \text{for }\operatorname{pid}\text{ i = 1,} \dots \text{,I} \end{aligned} \]

Fitting the models

library(lme4)

Here we run an empty model so that we have something to compare our model which includes our iv. Other than to give us a reference model, we do not have a huge amount of interest in this.

m0 <- lmer(dv ~ 1 + (1 | pid), data = simRPT)

Next, we specify our model. Here we include a fixed effect of our predictor (group membership, iv), and a random effect of participant (iv) to take account of the fact we have three measurements per person.

m1 <- lmer(dv ~ 1 + iv + (1 | pid), data = simRPT)
summary(m1)
## Linear mixed model fit by REML ['lmerMod']
## Formula: dv ~ 1 + iv + (1 | pid)
##    Data: simRPT
## 
## REML criterion at convergence: 1144.7
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.52990 -0.57636 -0.02015  0.56073  2.50816 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  pid      (Intercept) 74.66    8.641   
##  Residual             84.60    9.198   
## Number of obs: 150, groups:  pid, 50
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   38.842      1.785   21.76
## ivT2          11.976      1.840    6.51
## ivT3          28.964      1.840   15.74
## 
## Correlation of Fixed Effects:
##      (Intr) ivT2  
## ivT2 -0.515       
## ivT3 -0.515  0.500

And we can compare our models.

library(pbkrtest)
PBmodcomp(m1, m0)
## Bootstrap test; time: 18.87 sec; samples: 1000; extremes: 0;
## large : dv ~ 1 + iv + (1 | pid)
## dv ~ 1 + (1 | pid)
##          stat df   p.value    
## LRT    126.84  2 < 2.2e-16 ***
## PBtest 126.84     0.000999 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(m0,m1)
## Data: simRPT
## Models:
## m0: dv ~ 1 + (1 | pid)
## m1: dv ~ 1 + iv + (1 | pid)
##    npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    
## m0    3 1285.9 1294.9 -639.94   1279.9                         
## m1    5 1163.0 1178.1 -576.52   1153.0 126.83  2  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

OK, so we can see that we appear to have a significant effect of our repeated factor here. Our parametric bootstrap LRT is in agreement here.

Comparison to aov()

Using anova() to compare multilevel models will not give you a typical ANOVA output.
For piece of mind, it can be useful to compare how we might do this in aov()

m2 <- aov(dv ~ iv + Error(pid), data = simRPT)

Here the term Error(pid) is specifying the within person error, or residual. This is what we are doing with our random effect (1 | pid) in lmer()

And we can compare the model sums of squares from both approaches to see the equivalence:

summary(m2)
## 
## Error: pid
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 49  15121   308.6               
## 
## Error: Within
##           Df Sum Sq Mean Sq F value Pr(>F)    
## iv         2  21183   10591   125.2 <2e-16 ***
## Residuals 98   8291      85                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(m1)
## Analysis of Variance Table
##    npar Sum Sq Mean Sq F value
## iv    2  21183   10591  125.19

Check model

The residuals look reasonably normally distributed, and there seems to be fairly constant variance across the linear predictor. We might be a little concerned about the potential tails of the plot below, at which residuals don’t appear to have a mean of zero

plot(m1, type = c("p","smooth"))

library(lattice)
qqmath(m1)

Random effects are (roughly) normally distributed:

rans <- as.data.frame(ranef(m1)$pid)
ggplot(rans, aes(sample = `(Intercept)`)) + 
  stat_qq() + stat_qq_line() +
  labs(title="random intercept")

Visualise Model

library(sjPlot)
plot_model(m1, type="pred")
## $iv

plot_model(m1, type="re") # an alternative: dotplot.ranef.mer(ranef(m1))

Interpret model

Parametric bootstrap 95% CIs:

confint(m1, method = "boot")
##                 2.5 %   97.5 %
## .sig01       5.999069 11.24815
## .sigma       7.941403 10.46011
## (Intercept) 35.397974 42.37131
## ivT2         8.313544 15.62274
## ivT3        25.286894 32.52241

Case-resample bootstrap 95% CIs:

library(lmeresampler)
bootmodel <- bootstrap(m1, fixef, type = "case", B = 999, resample = c(TRUE,FALSE))
confint(bootmodel, type = "perc")
## # A tibble: 3 × 6
##   term        estimate lower upper type  level
##   <chr>          <dbl> <dbl> <dbl> <chr> <dbl>
## 1 (Intercept)     38.8 35.8   42.0 perc   0.95
## 2 ivT2            12.0  8.44  15.5 perc   0.95
## 3 ivT3            29.0 25.5   32.6 perc   0.95
res <- confint(bootmodel, type="perc")
res <- res %>% mutate_if(is.numeric,~round(.,2))
res
## # A tibble: 3 × 6
##   term        estimate lower upper type  level
##   <chr>          <dbl> <dbl> <dbl> <chr> <dbl>
## 1 (Intercept)     38.8 35.8   42.0 perc   0.95
## 2 ivT2            12.0  8.44  15.5 perc   0.95
## 3 ivT3            29.0 25.5   32.6 perc   0.95

Writing up an interpretation of this is a bit clunky as we have abstract names for our variables like “dv” and “iv”, but as a rough starting point:

Change in [dv] over [iv] was modeled using a linear mixed effects model, with a fixed effect of [iv] and a by-[pid] random intercepts. At baseline, scores on [dv] were estimated at 38.84 (cluster-resample bootstrap 95% CI: 35.79–42.01). Results indicated that, relative to time-point 1, scores at time-point 2 increased by 11.98 (8.44–15.54), and at time-point 3 had increased by 28.96 (25.48–32.6).